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VOICE DATA TRANSMITTING AND RECEIVING SYSTEM 
BACKGROUND OF THE INVENTION 

This application claimsjhg benefit of Japanese Patent Application 
No. 2002-349621 filed on December 2, 2002, the contents of which are 
5 incorporated by-the reference herein . 

The present invention relates to voice data transmitting and 
receiving system systems and, more particularly, to a voice data 
transmitting and receiving system capable of securing meaning Jrpm data 
transmitted via a communication path, such as^ quality of services (QoS) 
10 non-guaranteed network T Jthat may be for instance,-afHfltemet the 
Internet. 

A s i nt e m o t -wMeh lheJ Dleiliet is in common use across borders 
and all over the world, 7 -electfofvi€-rr>efGant4te Electronic commemp 
transactions and internet Internet telephone, i.e., internet protocol (IP) 

1 5 telephone, are attracting attentions attention aside from such conventional 
applications as- home p a g o reading = gf = ^§b = E§gg§, electronic-roa+te^maj, 
and file transfer. This is greatly attributable foj rapid advancement of not 
only-netwofk networks centered on line exchange in telephone n etwork 
networks, but also IP network networks based on packet exchanges 

20 exchange. 

In4^ = sgrn§ IP telephone communication, various data including 
voice (or FAX) data and also data of still images and motion picture 
images, are converted to IP packets to be transferred in^n IP4*ase = b = §§eg; 
network. What is called-internet Internet telephone is the utilization, in 
25 part of or full network s o rvic o, of the same IP network, i.e., communication 
network, for communication in^f^oet Internet protocolfas-those as is 
utilized for such applications as IP te le phone and ww w the World Wide 
Wgb^ ther e in by for voice telephone service utilizing IP network 
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techniques. 

Among-the IP telephone J^Stum are the following three different 
systems. In a first one of these systems, voice messages are exchanged 
between personal computers which are dial-up interconnected on4nternot, 
the Internet. In this system, it is necessary that the same software is 
installed in the personal computers, which are in turn connected to a 
server. In a second system, communication-oafi-not jsjjnnot be obtained 
unless a telephone call is provided from a personal computer to a usual 
subscribed telephone set (converse call being impossible) or 
prearrangements are made between the two sides. As a third system, 
two systems are present. In one of these systems, communication is 
made by inputting^ user ID and^ PIN via-mternet an Internet telephone 
gateway to a point of connection between an-intemet jnternet network for 
communication between usua l subsorib o r'o subscribers' telephone sets 
and a public telephone line switchboard. The other system is one for 
communication between direct inteft^ Memet -coupled terminals. These 
systems are closest to the present telephone communication system, and 
their technical advancement is outstanding. 

In the meantime, a system for transmitting a great deal of voice 
data in a narrow band has been proposed, in which on the transmission 
side an input voice is converted by voice recognition to character data, 
which are packeted and then transmitted, and on the reception side the 
received character data is converted to voice data, followed by voice 
synthesis and output of the resultant data as voice, thereby greatly 
reducing the transmitted data quantity and avoiding4he communication 
delay (see, for instance, Literature 1: Japanese patent laid-open Hei 
1 0-285275). This system, however, although it has an advantage of 
reducing transmitted data quantity-it is based on character data transfer. 
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Therefore, the voice obtained by the synthesis has a fixed character, and 
is different in character from the speaker's voice. 



IfllemeJ orj local network without guaranteed QoS-as communication 
5 quality, usually Real Time Comm unication Packets (RTPs) of User Data 
Packet (UDP) protocol are used for transmission and reception of voice 
data. In this case, although RTPs are used with importance attached to 
the real-time property of data in voice communication and motion picture 
playback, for the RTP no measure is provided against packet loss 

10 occurring on-tbe^ communication path, and lost packets are not 
re-transferred, thus posing problems in4he voice quality, such as 
interruptions of voice. 

To cope with these problems, heretofore, a system has been 
proposed, in which RTPs are transmitted together with preceding and 

15 succeeding packet data for an interpolating process according thereto, so 
that the voice will not be interrupted even in a packet loss event. 



other than voice is frequently present, voice packet loss is pronounced, 
and the voice quality deterioration is too significant even by using the 
interpolation, sometimes resulting in failure of recognizing the meaning of 
the speech. 

As shown above, the real-time voice communication by packet- 
tfaf ism iGsio n 7 ir3ns^j§sjQD is subject to-misstftg jgsj of RTPs due to 
deterioration of4hea communication path environment, thus resulting in- 



In IP voice communication via ^an IP network such as4ntemet the 



However, in an 



v environment in which data communication 
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SUMMARY OF THE INVENTION 
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An object of the present invention, accordingly, is to provide a 
voice data transmitting and receiving system capable of allowing for trip 
recognizing^ the meaning of speech even in a deteriorated 
communication path environment. 
5 Another object of the present invention is to provide a voice data 

transmitting and receiving system capable of^owjn^ferJhe recognizing^ 
gf the meaning of speech irrespective of packet-missing l os s due to 
causes in the communication path. 

According to a first aspect of the present invention, there is 
10 provided a voice data transmitting and receiving system for transmitting 
and receiving voice data as packet data via a network, wherein: on^^ 
transmission side voice clauses are divided and transmitted as packet data 
in divided clause units, and on-the^ reception side the voice data is 
outputted as voice based on the received packet data in clause units. 
15 According to a second aspect of the present invention, there is 

provided a voice data transmitting and receiving system, wherein: on the 
transmission side: real-time communication packets are generated based 
on input voice data; the input voice data is divided into clause units; and a 
plurality of RTP voice data in the clause units are transferred as packet 
20 data to a communication path; and on the reception side: packet data in 
clause units are obtained from packeted received data received via the 
communication path, thereby producing a replica of the RTPs in clause 
units; and outputting the voice data as voice based on the replica of the 
RTPs. 

25 According to a third aspect of the present invention, there is 

provided a voice data transmitting and receiving system, - wh e roin; wherein 
on the transmission side: real-time communication packets are generated 
based on input voice data; the input voice data is divided off into clause 
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units; and a plurality of voice data RTPs in the clause units are combined 
into a single packet data and transferred to a communication path; and on 
the reception side: packet data in clause units are obtained from packeted 
received data received via the communication path, thereby producing a 
5 replica of the RTPs in clause units; and the voice data is outputted as 
voice based on the plurality of RTPs. 

The data sent out from the transmission side is in the form of a file. 
On the transmission side either a re-transfer request is provided by 
recognizingjthe missing of received data or an interpolation process on the 

10 received data is executed based on the received file data. The file data 
sent out from the transmission side is provided with discrimination data. 
In the reception, transmission side data is taken out from the received file 
data based on the discrimination data. The voice is divided into clauses 
based on voice recognition. The voice is divided into clauses based on 

15 an externally provided instruction. The voice is divided into clauses 

based on the sound level of the input voice. The voice is divided off into 
clauses based on changes in the input voice pitch level. The voice is 
divided off into clauses based on measured movement of the user's lips. 
The voice is divided off into clauses based on measured vibrations of the 

20 user's throat. The systems are selected based on the extent of 

communication per unit time between the transmission and reception 
sides. 

According to a fourth aspect of the present invention, there is 
provided a voice data transmitting and receiving method as packet data via 
25 a network, wherein voice clauses are divided and transmitted as packet 
data in divided clause units in a transmission side, and the voice data is 
outputted as voice based on the received packet data in clause units in a 
receipt side. 
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According to a fifth aspect of the present invention, there is 
provided a voice data transmitting and receiving method, wherein: 
real-time communication packets are generated based on input voice data, 
the input voice data is divided into clause units and a plurality of RTP voice 
5 data in the clause units are transferred as packet data to a communication 
path in a transmission side; and packet data in clause units are obtained 
from packeted received data received for producing a replica of the RTPs 
in clause units; and the voice data is outputted as voice based on the 
replica of the RTPs in a receipt side. 
1° According to a sixth aspect of the present invention, there is 

provided a voice data transmitting and receiving method, whereim wherein 
real-time communication packets are generated based on input voice data, 
the input voice data is divided off into clause units and a plurality of voice 
data RTPs in the clause units are combined into a single packet data and 
15 transferred to a communication path in a transmission side; and packet 
data in clause units are obtained from packeted received data for 
producing a replica of the RTPs in clause units and the voice data is 
outputted as voice based on the plurality of RTPs. 

Other objects and features will be clarified from the following 
20 description with reference to the attached drawings. 
BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 is a system structure of a voice data transmitting and 
receiving system of a first embodiment according to the present invention; 
Fig. 2 is a system structure of a voice data transmitting and 
25 receiving system of a second embodiment according to the present 
invention; 

Fig. 3 is a system structure of a voice data transmitting and 
receiving system of a third embodiment according to the present invention; 
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Fig. 4 is a system structure of a voice data transmitting and 
receiving system of a fourth embodiment according to the present 
invention; and 

Fig. 5 is a view for describing the operation of the embodiment 
5 shown in Fig. 4. 

PREFERRED EMBODIMENTS OF THE INVENTION 

Preferred embodiments of the present invention will now be 
described with reference to the drawings. 

Fig. 1 is a system structure of a voice data transmitting and 
10 receiving system of a first embodiment according to the present invention. 
In this embodiment, the transmission side comprises a communication 
terminal 11, a voice recognizer unit 12 and a packet combine unit 13, and 
the reception side which is connected to the transmission side via-a«- 
inte rne t JtbeJtaj^^ or like communication channel, comprises a packet 
15 division unit 21 and a communication terminal 22. While each user of 
course has both transmitting and receiving functions for the conversation 
purpose, in the following description the transmission and reception side 
are dealt with separately. 

On the transmission side, abuser's voice inputted to a microphone 
20 or like voice input device is processed as voice data in a communication 
terminal 11. On the reception side, a communication terminal 22 
processes the voice, and outputs the processed voice via a loudspeaker or 
like voice output device. 

On the transmission side, the communication terminal 11 
25 generates real-time communication packets (hereinafter abbreviated as 
RTP) based on the input voice data. The voice recognizer unit 12 
receives the voice data from the communication terminal 11, and executes 
a voice recognition process to divide off the voice into clause units. The 
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packet combine unit 13 combines a plurality of voice data RTPs in clause 
units from the voice recognizer unit 12 into a single packet data to be sent 
out to a communication path. The packet combine unit 13 may send out 
the voice data RTPs in clause units as such. 
5 On the reception side, the packet division unit 21 executes packet 

division of packeted received data received via the communication path to 
obtain RTPs of voice data in clause units, thus producing replica of a 
plurality of RTPs as clause units. The communication terminal 22 
reproduces the transmission side voice data based on the plurality of 

10 RTPs received from the packet division unit 21 . 

As shown above, in this embodiment clause units as divisions 
having means of voice composition are discriminated for transmission and 
reception as real-time communication packets in the discriminated clause 
units. Thus, even when packetmissif^^ occurs on the 

15 communication path due to deterioration of the communication 

environment due to such cause as communication line deterioration, the 
meaning of each clause can be transmitted, and reliable data transfer is 
possible. 

A second embodiment of the voice data transmitting/receiving 
20 system according to the present invention will now be described with 

reference to the block diagram of Fig. 2. In Fig. 2, parts having functions 
like those in the case of Fig. 1 are designated by like reference numerals. 

In this embodiment, the transmission side comprises a 
communication terminal 11, a voice recognizer unit 12, a packet combine 
25 unit 13 and^s file producer (filing) unit 14, and the reception side which is 
connected to the transmission side via an inter net the Internet or like 
communication path comprises a packet division unit 21, a communication 
terminal 22 and a file receiver unit 23. 
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On the transmission side, the communication terminal 11 
generates RTPs based on the input voice data. The voice recognizer unit 
12 executes a voice recognition process on the voice data from the 
communication terminal 11 to divide off voice into clause units. The 
5 packet combine unit 13 combines a plurality of voice data RTPs in clause 
units to produce a single packet data to be sent out to the file producer unit 
14. The file producer unit 14 produces as file of the receive packets, and 
sends out the file to the communication path. 

On the reception side, the file receiver unit 23 receives the file 
10 data received via the communication path, and sends out the received file 
data as packet data to the packet division unit 21 . The file receiver unit 
23 also recognizes missing, if any, of received data from the received file 
data in order to send outj| data re-transfer request to the transmission 
side or execute an interpolating process on the received data so as to 
15 prevent missing of data. 

The packet division unit 21 executes packet division of data 
received from the file receiver unit 23 to obtain the voice data RTPs in 
clause units and reproduce a replica of a plurality of RTPs as a single 
clause. The communication terminal 22 generates transmission side 
20 voice data from the plurality of RTPs received from the packet division unit 
21 , and causes the generated data to be outputted as voice from the 
loudspeaker. 

In the above second embodiment, in addition to the advantage 
obtainable with the previous first embodiment that, transfer of the meaning 
25 of each clause and also reliable data transfer are obtainable even in the 
event of packet missing occurrence of the communication path due to 
deterioration of the communication environment stemming from such 
cause as communication line deterioration, it is possible to recognize 
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missing of received data on the basis of the file data received from Ih^ file 
receiver unit 23 so as to send out a data re-transfer request or prevent 
missing of data through an interpolating process on the received data. 

A third embodiment of the voice data transmitting/receiving system 
5 according to the present invention will now be descried with reference to 
the block diagram of Fig. 3. In Fig. 3, parts having functions like those in 
the case of Fig. 2 are designated by like reference numerals. 

This embodiment is basically the same in arrangement and 
operation with the above second embodiment shown in Fig. 2. This 

10 embodiment is greatly effective injt case where a4ir e w all jfirewall 24 is 

provided between the communication path and the reception side. In this 
embodiment, the file producer unit 14 sends out the file data by making 
use of a generally open port such as HTTP and FTP, and in order for 
discrimination from any other file, discrimination data is provided after the 

15 file production. 

On the reception side, the file receiver unit 23 which is connected 
via-afHfrtemet Jhe Internej or like communication path, takes out a file 
transmitted from the file producer unit 1 4- form from the full received file on 
the basis of the discrimination data, and sends out the taken-out file to the 

20 packet division unit 21 . The file receiver unit 23, like the above case, 
recognizes missing of received data and sends out a data re-transfer 
instruction to prevent missing of data through an interpolation process of 
the received data. 

In the third embodiment, in addition to the advantage obtainable in 

25 the first and second embodiments that transfer the meaning of each 

clause and also reliable data transfer is obtainable irrespective of packet 
missing on the communication path due to deterioration of the 
communication environment stemming from such cause as 
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communication line deterioration, and that the data missing can be 
prevented by a data re-transfer process and a received data interpolation 
process based on the recognition of received data missing based on the 
received file data, it is possible to obtain communication withj| 
5 communication terminal over th e fire w all a firewall . 

Further embodiments of the present invention will now be 
described, which are different forms of § voice clause separation (or 
discrimination) system. 

In a fourth embodiment of the present invention, signals 
10 representing manual clause divisions are outputted. it is thus possible to 
input necessary divisions withji person's judgment by using a manual 
clause division d e v i ce s device . 

With this embodiment, division data can be inputted in any 
environment. Thus, the embodiment can be used not only for voice but 
15 also for music and continuous tones. Furthermore, the embodiment can 
be used for other RTP communications such as image communication. 

In a fifth embodiment of the present invention, voice clause 
divisions are determined based on the measured input sound level. More 
specifically, the inputted sound level is measured, and an instant when the 
20 measured level is reduced down to a particular level, is determined to be a 
division or an off-division. The particular value in this case may be the 
noise level when the utterance comes to a pause. 

This embodiment permits automatically dividing clauses at natural 
divisions in the utterance. 
25 In a sixth embodiment of the present invention, voice clause 

divisions are determined based on the measured input sound pitch. 
Specifically, the input sound pitch is measured, and an instant when a pitch 
difference exceeds a constant value is determined to be a division. 
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This embodiment permits automatically discriminating off-divisions 
of utterance irrespective of whether the background noise level is high. 

In a seventh embodiment of the present invention, voice clause 
off-divisions are determined based on the movement of lips by making 
5 image measurement of the face of a person during voice input. In other 
words, image measurement of a person during voice input is made, and 
an instant when the movement of the lips becomes stagnant is determined 
to be a division. 

With this embodiment, divisions are determined with a mechanism 
10 different from the voice process, and it is thus possible to automatically 
discriminate divisions without any appropriate voice discriminator. 

In an eighth embodiment of the present invention, voice clause 
divisions are determined based on the measured vibrations of the throat. 
Specifically, vibrations of the throat are measured, and an instant when the 
15 vibration is stopped is determined to be the division. 

With this embodiment, divisions-off are determined with a 
mechanism different from the voice process, and it is thus possible to 
automatically discriminate off-divisions without any appropriate voice 
discrimination. The embodiment can also be used in the case of 
20 extremely low voice level. 

In a ninth embodiment of the present invention, voice clause 
off-divisions are determined by a method of discrimination and analysis of 
voice as compositions. Specifically, voice is analyzed as compositions, 
and proper off-divisions are determined. As techniques of analysis of 
25 voice to compositions, well-known techniques may be used. 

With this embodiment, it is possible to automatically determine 
off-divisions from meanings even in an environment, in which the above 
meth od^afmot ca n not be utilized, for instance with flat and long 
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continuous voice. 

A tenth embodiment of the present invention will now be 
described. 

This embodiment is installed on the transmission side (or 
5 reception side), and an optimum communication means is informed to both 
the transmitting and receiving communication terminals by observing the 
communication status. 

Fig. 4 is a block diagram showing the present embodiment. 
This embodiment comprises a transmission/reception monitor unit 
10 31 for sensing the start and end of communication, a communication time 
storage unit 32 for accumulating communication time, a communication 
extent storage unit 33 for accumulating the quantity of transmitted or 
received data, a reference value/corresponding means storage unit 34 for 
storing reference values for switching communication means and also 
15 these communication means, a comparative computer unit 35 for 
calculating the communication extent from the outputs of the 
communication time storage unit 32 and the communication extent storage 
unit 33, comparing the calculated value with the reference values stored in 
the reference value/corresponding means storage unit 34, and a 
20 communication means informing unit 36 for receiving a communication 
means from the comparative computer means 35 and commanding the 
switching to the received communication means. 

The operation of the embodiment will now be described with 
reference to Fig. 5. 

25 When the communication is started, the transmission/reception 

monitoring unit 31 senses the start for communication, and causes the 
communication time storage unit 32 and the communication extent storage 
unit 33 to start accumulations, respectively. Whenever a constant time 



UKCA 1375895.1 



Marked-Up Version of Substitute Specification - 10/720,135 

passes, the data stored in the communication time storage unit 32 and the 
communication extent storage unit 33 are sent out to the comparative 
computer unit 35, while the accumulated data in the communication time 
storage unit 32 and the communication extent storage unit 33 are deleted. 
5 The comparative computer unit 35 computes the extent of communication 
per unit time from the data sent out from the communication time storage 
unit 32 and the communication extent storage unit 33, compares the result 
of calculation with the reference values stored in the reference 
value/corresponding means storage unit 34, and sends out the data of the 

10 corresponding communication means to the communication means 

informing unit 36. The communication means informing unit 36 sends out 
a command for switching to the selected communication means to the 
communication terminal. When the communication is ended, the 
transmission/reception monitoring unit 31 senses the end of accumulations, 

15 and notifies the end of accumulations and deletion of the stored values to 
the communication time storage unit 32 and the communication extent 
storage unit 33. As shown, in this embodiment the above systems can 
be selectively used based on the extent of communication per unit time 
between the transmission and reception sides. 

20 With this embodiment, the transmission/reception can do 

communication with optimum communication means matched to the 
environment of the communication path. As examples of the 
communication means, the RTP communication may be normally selected, 
the clause division packet communication may be selected in a bad 

25 communication path environment, and the file production communication 
may be selected in the worse communication path environment. 

The arrangements and operations of the preferred embodiments 
have been described above. However, these embodiments are merely 
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examples of the present invention and are by no means limitative. It will 
now be readily understood to the person skilled in the art that various 
changes and modifications are possible in dependence on particular uses 
without departing from the scope of the present invention. 
5 As has been described in the foregoing, with the voice data 

transmission/reception system according to the present invention not only- 
tt^iiJt possible to obtain transfer of the meaning of each clause and 
reliable data transfer even when missing of packets occurs on the 
communication path due to deterioration of communication environments 

10 stemming from such cause as communication line deterioration, but also it 
is possible to send out§ data re-transfer request by recognizing missing of 
received data and prevent data missing by a received data interpolation 
process, thus permitting communication with^a communication terminal in- 
exeess the presence o f fire wa ll a firewall. 

15 Furthermore, the transmitting and receiving communication 

terminals can do communication with proper communication means by 
matching the environment of the communication path. 

Changes in construction will occur to those skilled in the art and 
various apparently different modifications and embodiments may be made 

20 without departing from the scope of the present invention. The matter set 
forth in the foregoing description and accompanying drawings is offered by 
way of illustration only. It is therefore intended that the foregoing 
description be regarded as illustrative rather than limiting. 
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ABSTRACT OF THE DISCLOSURE 

Voice clauses are divided and transmitted as packet data in 
divided clause units in a transmission side. The voice data is outputted 
as voice based on the received packet data in clause units in a receipt side. 
5 Thus, the meaning of speech-even is even able to be recognized in a 
deteriorated communication path environment. 
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